AITopics | rl policy

Scaffolding Dexterous Manipulation with Vision-Language Models

Neural Information Processing SystemsJun-13-2026, 03:22:07 GMT

Dexterous robotic hands are essential for performing complex manipulation tasks, yet remain difficult to train due to the challenges of demonstration collection and high-dimensional control. While reinforcement learning (RL) can alleviate the data bottleneck by generating experience in simulation, it typically relies on carefully designed, task-specific reward functions, which hinder scalability and generalization. Thus, contemporary works in dexterous manipulation have often bootstrapped from reference trajectories. These trajectories specify target hand poses that guide the exploration of RL policies and object poses that enable dense, task-agnostic rewards. However, sourcing suitable trajectories---particularly for dexterous hands---remains a significant challenge. Yet, the precise details in explicit reference trajectories are often unnecessary, as RL ultimately refines the motion.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.67)

Add feedback

Accelerating Reinforcement Learning Training Using Simulation Surrogate Models

Ghasemloo, Mohammadmahdi, Eckman, David J., Li, Yaxian

arXiv.org Machine LearningMay-28-2026

High-fidelity simulation models are widely used to analyze complex stochastic systems, but their high computational cost motivates the development of cheaper surrogate models that approximate the simulation model's input-output relationship. In parallel, reinforcement learning (RL) has emerged as a powerful framework for making online decisions in stochastic environments, with increasing attention being given to the use of simulation models as training environments for RL models. We investigate a class of surrogate models suitable for accelerating RL training in settings where the reward structure, model parameters, or system dynamics change over time and explore their interactions with simulation models and RL models. Through numerical experiments on a stochastic service system modeled via discrete-event simulation, we demonstrate that leveraging surrogate models can substantially accelerate RL training and re-training.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2605.27556

Country:

North America > United States > Texas (0.14)
North America > United States > New York (0.14)
North America > United States > New Jersey (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.50)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.35)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

48db71587df6c7c442e5b76cc723169a-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 17:46:40 GMT

default action, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Scaling up multi-agent systems: an interview with Minghong Geng

AIHubApr-21-2026, 13:37:45 GMT

In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. Minghong Geng recently completed his PhD and is now working as a postdoctoral researcher at Singapore Management University. We sat down to discuss his research on multi-agent systems. Firstly, congratulations on completing your PhD! What is the general topic of your research? I work on multi-agent systems.

artificial intelligence, multi-agent system, singapore management university, (14 more...)

AIHub

Country: Asia > Singapore (0.26)

Genre: Personal > Interview (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach

Neural Information Processing SystemsMar-21-2026, 19:17:19 GMT

Deep reinforcement learning agents achieve state-of-the-art performance in a wide range of simulated control tasks. However, successful applications to real-world problems remain limited. One reason for this dichotomy is because the learnt policies are not robust to observation noise or adversarial attacks. In this paper, we investigate the robustness of deep RL policies to a single small state perturbation in deterministic continuous control tasks. We demonstrate that RL policies can be deterministically chaotic, as small perturbations to the system state have a large impact on subsequent state and reward trajectories. This unstable non-linear behaviour has two consequences: first, inaccuracies in sensor readings, or adversarial attacks, can cause significant performance degradation; second, even policies that show robust performance in terms of rewards may have unpredictable behaviour in practice. These two facets of chaos in RL policies drastically restrict the application of deep RL to real-world problems. To address this issue, we propose an improvement on the successful Dreamer V3 architecture, implementing Maximal Lyapunov Exponent regularisation. This new approach reduces the chaotic state dynamics, rendering the learnt policies more resilient to sensor noise or adversarial attacks and thereby improving the suitability of deep reinforcement learning for real-world applications.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Towards Effective Planning Strategies for Dynamic Opinion Networks

Neural Information Processing SystemsFeb-18-2026, 18:04:16 GMT

Our experimental results demonstrate that the ranking algorithm-based classifiers provide plans that enhance infection rate control, especially with increased action budgets for small networks.

infection rate, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > South Carolina (0.04)
Asia > Middle East > Oman > Muscat Governorate > Muscat (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Government > Voting & Elections (0.67)
Media > News (0.53)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)

Add feedback

Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach Rory Young Nicolas Pugeault School of Computing Science University of Glasgow

Neural Information Processing SystemsFeb-16-2026, 23:24:59 GMT

Deep reinforcement learning agents achieve state-of-the-art performance in a wide range of simulated control tasks. However, successful applications to real-world problems remain limited. One reason for this dichotomy is because the learnt policies are not robust to observation noise or adversarial attacks. In this paper, we investigate the robustness of deep RL policies to a single small state perturbation in deterministic continuous control tasks.

machine learning, reinforcement learning, trajectory, (18 more...)

Neural Information Processing Systems

Country: North America > Puerto Rico > San Juan > San Juan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

c0e19ce0dbabbc0d17a4f8d4324cc8e3-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 23:56:29 GMT

failure trajectory, safety specification, trajectory, (14 more...)

Neural Information Processing Systems

Country:

Asia > India > West Bengal > Kharagpur (0.04)
North America > United States > Montana (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture

Wang, Maonan, Chen, Yirong, Pang, Aoyu, Cai, Yuxin, Chen, Chung Shue, Kan, Yuheng, Pun, Man-On

arXiv.org Artificial IntelligenceDec-12-2025

Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.

large language model, machine learning, vlmlight, (18 more...)

arXiv.org Artificial Intelligence

2505.19486

Country: Asia > China (0.47)

Genre: Research Report > New Finding (0.67)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control

Stewart, Kenneth, Chapin, Samantha, Leontie, Roxana, Henshaw, Carl Glen

arXiv.org Artificial IntelligenceDec-4-2025

Abstract-- Reinforcement learning (RL) offers transforma-tive potential for robotic control in space. We present the first on-orbit demonstration of RL-based autonomous control of a free-flying robot, the NASA Astrobee, aboard the International Space Station (ISS). Using NVIDIA's Omniverse physics simulator and curriculum learning, we trained a deep neural network to replace Astrobee's standard attitude and translation control, enabling it to navigate in microgravity. This successful deployment demonstrates the feasibility of training RL policies terrestrially and transferring them to space-based applications. This paves the way for future work in In-Space Servicing, Assembly, and Manufacturing (ISAM), enabling rapid on-orbit adaptation to dynamic mission requirements. Future In-Space Servicing, Assembly, and Manufacturing (ISAM) missions require increasingly autonomous robotic systems capable of adapting to the dynamic and uncertain conditions of space.

machine learning, reinforcement learning, variation, (18 more...)

arXiv.org Artificial Intelligence

2512.03736

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (0.93)
Government > Space Agency (0.91)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Filters

Collaborating Authors

rl policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Scaffolding Dexterous Manipulation with Vision-Language Models

Accelerating Reinforcement Learning Training Using Simulation Surrogate Models

48db71587df6c7c442e5b76cc723169a-Paper.pdf

Scaling up multi-agent systems: an interview with Minghong Geng

Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach

Towards Effective Planning Strategies for Dynamic Opinion Networks

Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach Rory Young Nicolas Pugeault School of Computing Science University of Glasgow

c0e19ce0dbabbc0d17a4f8d4324cc8e3-Paper.pdf

VLMLight: Safety-Critical Traffic Signal Control via Vision-Language Meta-Control and Dual-Branch Reasoning Architecture

Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control